14 research outputs found

    Machine Learning for Intrusion Detection: Modeling the Distribution Shift

    No full text
    This paper addresses two important issue that arise in formulating and solving computer intrusion detection as a machine learning problem, a topic that has attracted considerable attention in recent years including a community wide competition using a common data set known as the KDD Cup ā€™99. The first of these problems we address is the size of the data set, 5 Ɨ 106 by 41 features, which makes conventional learning algorithms impractical. In previous work, we introduced a one-pass non-parametric classification technique called Voted Spheres, which carves up the input space into a series of overlapping hyperspheres. Training data seen within each hypersphere is used in a voting scheme during testing on unseen data. Secondly, we address the problem of distribution shift whereby the training and test data may be drawn from slightly different probability densities, while the conditional densities of class membership for a given datum remains the same. We adopt two recent techniques from the literature, density weighting and kernel mean matching, to enhance the Voted Spheres technique to deal with such distribution disparities. We demonstrate that substantial performance gains can be achieved using these techniques on the KDD cup data set

    Cardiovascular disease, cancer and mortality among people with type 2 diabetes and alcoholic or non-alcoholic fatty liver disease hospital admission

    Get PDF
    OBJECTIVE: To describe associations between alcoholic fatty liver disease (ALD) or non-alcoholic fatty liver disease (NAFLD) hospital admission and cardiovascular disease (CVD), cancer, and mortality in people with T2DM. RESEARCH DESIGN AND METHODS: We performed a retrospective cohort study using linked population-based routine data from the diabetes register, hospital, cancer and death records for people aged 40-89 years, diagnosed with T2DM in Scotland 2004-2013 who had one or more hospital admission records. Liver disease and outcomes were identified using International Classification of Diseases codes. We estimated hazard ratios from Cox proportional hazards models, adjusted for key risk factors (aHRs). RESULTS: There were 134,368 people with T2DM (1707 with ALD and 1452 with NAFLD) with mean follow-up of 4.3 years for CVD and 4.7 years for mortality. Among people with ALD, NAFLD or without liver disease hospital records respectively there were: 378, 320 and 21,873 CVD events, 268, 176 and 15,101 cancers and 724, 221 and 16,203 deaths. For ALD and NAFLD respectively, aHRs (95% CIs) compared to the group with no record of liver disease were: 1.59 (1.43, 1.76) and 1.70 (1.52, 1.90), for CVD; 40.3 (28.8, 56.5) and 19.12(11.71 31.2), for hepatocellular cancer (HCC); 1.28 (1.12, 1.47) and 1.10 (0.94, 1.29) for non-HCC cancer; 4.86 (4.50, 5.24) and 1.60 (1.40, 1.83) for all-cause mortality. CONCLUSIONS: Hospital records of ALD or NAFLD are associated, to varying degrees, with increased risk of CVD, cancer and mortality in people with T2DM

    Biomarkers of rapid chronic kidney disease progression in type 2 diabetes.

    Get PDF
    Here we evaluated the performance of a large set of serum biomarkers for the prediction of rapid progression of chronic kidney disease (CKD) in patients with type 2 diabetes. We used a case-control design nested within a prospective cohort of patients with baseline eGFR 30-60ā€‰ml/min per 1.73ā€‰m(2). Within a 3.5-year period of Go-DARTS study patients, 154 had over a 40% eGFR decline and 153 controls maintained over 95% of baseline eGFR. A total of 207 serum biomarkers were measured and logistic regression was used with forward selection to choose a subset that were maximized on top of clinical variables including age, gender, hemoglobin A1c, eGFR, and albuminuria. Nested cross-validation determined the best number of biomarkers to retain and evaluate for predictive performance. Ultimately, 30 biomarkers showed significant associations with rapid progression and adjusted for clinical characteristics. A panel of 14 biomarkers increased the area under the ROC curve from 0.706 (clinical data alone) to 0.868. Biomarkers selected included fibroblast growth factor-21, the symmetric to asymmetric dimethylarginine ratio, Ī²2-microglobulin, C16-acylcarnitine, and kidney injury molecule-1. Use of more extensive clinical data including prebaseline eGFR slope improved prediction but to a lesser extent than biomarkers (area under the ROC curve of 0.793). Thus we identified several novel associations of biomarkers with CKD progression and the utility of a small panel of biomarkers to improve prediction.We acknowledge all the SUMMIT partners (http://www.imi-summit.eu/) for their assistance with this project. This work was funded by the Innovative Medicine Initiative under grant agreement no. IMI/115006 (the SUMMIT consortium) and the Go-DARTS cohort was funded by the Chief Scientists Office Scotland.This is the accepted manuscript of a paper published in Kidney International (Looker et al., Kidney International, 2015 doi: 10.1038/ki.2015.199). The final version is available at http://dx.doi.org/10.1038/ki.2015.19

    Apolipoprotein CIII and N-terminal prohormone b-type natriuretic peptide as independent predictors for cardiovascular disease in type 2 diabetes

    Get PDF
    Background and aims: Developing sparse panels of biomarkers for cardiovascular disease in type 2 diabetes would enable risk stratification for clinical decision making and selection into clinical trials. We examined the individual and joint performance of five candidate biomarkers for incident cardiovascular disease (CVD) in type 2 diabetes that an earlier discovery study had yielded. Methods: Apolipoprotein CIII (apoCIII), N-terminal prohormone B-type natriuretic peptide (NT-proBNP), high sensitivity Troponin T (hsTnT), Interleukin-6, and Interleukin-15 were measured in baseline serum samples from the Collaborative Atorvastatin Diabetes trial (CARDS) of atorvastatin versus placebo. Among 2105 persons with type 2 diabetes and median age of 62.9 years (range 39.2ā€“77.3), there were 144 incident CVD (acute coronary heart disease or stroke) cases during the maximum 5-year follow up. We used Cox Proportional Hazards models to identify biomarkers associated with incident CVD and the area under the receiver operating characteristic curves (AUROC) to assess overall model prediction. Results: Three of the biomarkers were singly associated with incident CVD independently of other risk factors; NT-proBNP (Hazard Ratio per standardised unit 2.02, 95% Confidence Interval [CI] 1.63, 2.50), apoCIII (1.34, 95% CI 1.12, 1.60) and hsTnT (1.40, 95% CI 1.16, 1.69). When combined in a single model, only NT-proBNP and apoCIII were independent predictors of CVD, together increasing the AUROC using Framingham risk variables from 0.661 to 0.745. Conclusions: The biomarkers NT-proBNP and apoCIII substantially increment the prediction of CVD in type 2 diabetes beyond that obtained with the variables used in the Framingham risk score

    Risk of acute kidney injury and survival in patients treated with Metformin:an observational cohort study

    Get PDF
    Background: Whether metformin precipitates lactic acidosis in patients with chronic kidney disease (CKD) remains under debate. We examined whether metformin use was associated with an increased risk of acute kidney injury (AKI) as a proxy for lactic acidosis and whether survival among those with AKI varied by metformin exposure. Methods: All individuals with type 2 diabetes and available prescribing data between 2004 and 2013 in Tayside, Scotland were included. The electronic health record for diabetes which includes issued prescriptions was linked to laboratory biochemistry, hospital admission, death register and Scottish Renal Registry data. AKI events were defined using the Kidney Disease Improving Global Outcomes criteria with a rise in serum creatinine of at least 26.5 Ī¼mol/l or a rise of greater than 150% from baseline for all hospital admissions. Cox Regression Analyses were used to examine whether person-time periods in which current metformin exposure occurred were associated with an increased rate of first AKI compared to unexposed periods. Cox regression was also used to compare 28 day survival rates following first AKI events in those exposed to metformin versus those not exposed. Results: Twenty-five thousand one-hundred fourty-eight patients were included with a total person-time of 126,904 person years. 4944 (19.7%) people had at least one episode of AKI during the study period. There were 32.4 cases of first AKI/1000pyrs in current metformin exposed person-time periods compared to 44.9 cases/1000pyrs in unexposed periods. After adjustment for age, sex, diabetes duration, calendar time, number of diabetes drugs and baseline renal function, current metformin use was not associated with AKI incidence, HR 0.94 (95% CI 0.87, 1.02, p = 0.15). Among those with incident AKI, being on metformin at admission was associated with a higher rate of survival at 28 days (HR 0.81, 95% CI 0.69, 0.94, p = 0.006) even after adjustment for age, sex, pre-admission eGFR, HbA1c and diabetes duration. Conclusions: Contrary to common perceptions, we found no evidence that metformin increases incidence of AKI and was associated with higher 28 day survival following incident AKI

    Serum kidney injury molecule 1 and Ī²2-microglobulin perform as well as larger biomarker panels for prediction of rapid decline in renal function in type 2 diabetes

    Get PDF
    Aims/hypothesis: As part of the Surrogate Markers for Micro- and Macrovascular Hard Endpoints for Innovative Diabetes Tools (SUMMIT) programme we previously reported that large panels of biomarkers derived from three analytical platforms maximised prediction of progression of renal decline in type 2 diabetes. Here, we hypothesised that smaller (n ā‰¤ 5), platform-specific combinations of biomarkers selected from these larger panels might achieve similar prediction performance when tested in three additional type 2 diabetes cohorts. Methods: We used 657 serum samples, held under differing storage conditions, from the Scania Diabetes Registry (SDR) and Genetics of Diabetes Audit and Research Tayside (GoDARTS), and a further 183 nested caseā€“control sample set from the Collaborative Atorvastatin in Diabetes Study (CARDS). We analysed 42 biomarkers measured on the SDR and GoDARTS samples by a variety of methods including standard ELISA, multiplexed ELISA (Luminex) and mass spectrometry. The subset of 21 Luminex biomarkers was also measured on the CARDS samples. We used the event definition of loss of >20% of baseline eGFR during follow-up from a baseline eGFR of 30ā€“75 ml mināˆ’1 [1.73 m]āˆ’2. A total of 403 individuals experienced an event during a median follow-up of 7 years. We used discrete-time logistic regression models with tenfold cross-validation to assess association of biomarker panels with loss of kidney function. Results: Twelve biomarkers showed significant association with eGFR decline adjusted for covariates in one or more of the sample sets when evaluated singly. Kidney injury molecule 1 (KIM-1) and Ī²2-microglobulin (B2M) showed the most consistent effects, with standardised odds ratios for progression of at least 1.4 (p < 0.0003) in all cohorts. A combination of B2M and KIM-1 added to clinical covariates, including baseline eGFR and albuminuria, modestly improved prediction, increasing the area under the curve in the SDR, Go-DARTS and CARDS by 0.079, 0.073 and 0.239, respectively. Neither the inclusion of additional Luminex biomarkers on top of B2M and KIM-1 nor a sparse mass spectrometry panel, nor the larger multiplatform panels previously identified, consistently improved prediction further across all validation sets. Conclusions/interpretation: Serum KIM-1 and B2M independently improve prediction of renal decline from an eGFR of 30ā€“75 ml mināˆ’1 [1.73 m]āˆ’2 in type 2 diabetes beyond clinical factors and prior eGFR and are robust to varying sample storage conditions. Larger panels of biomarkers did not improve prediction beyond these two biomarkers

    One-pass algorithms for large and shifting data sets

    No full text
    For many problem domains, practitioners are faced with the problem of ever-increasing amounts of data. Examples include the UniProt database of proteins which now contains ~6 million sequences, and the KDD ā€™99 data which consists of ~5 million points. At these scales, the state-of-the-art machine learning techniques are not applicable since the multiple passes they require through the data are prohibitively expensive, and a need for different approaches arises. Another issue arising in real-world tasks, which is only recently becoming a topic of interest in the machine learning community, is distribution shift, which occurs naturally in many problem domains such as intrusion detection and EEG signal mapping in the Brain-Computer Interface domain. This means that the i.i.d. assumption between the training and test data does not hold, causing classifiers to perform poorly on the unseen test set.We first present a novel, hierarchical, one-pass clustering technique that is capable of handling very large data. Our experiments show that the quality of the clusters generated by our method does not degrade, while making vast computational savings compared to algorithms that require multiple passes through the data. We then propose Voted Spheres, a novel, non-linear, one-pass, multi-class classification technique capable of handling millions of points in minutes. Our empirical study shows that it achieves state-of-the-art performance on real world data sets, in a fraction of the time required by other methods. We then adapt the VS to deal with covariate shift between the training and test phases using two different techniques: an importance weighting scheme and kernel mean matching. Our results on a toy problem and the real-world KDD ā€™99 data show an increase in performance to our VS framework. Our final contribution involves applying the one-pass VS algorithm, along with the adapted counterpart (for covariate shift), to the Brain-Computer Interface domain, in which linear batch algorithms are generally used. Our VS-based methods outperform the SVM, and perform very competitively with the submissions of a recent BCI competition, which further shows the robustness of our proposed techniques to different problem domains

    Impact of hypertension on the association of BMI with risk and age at onset of type 2 diabetes mellitus: age- and gender-mediated modifications.

    No full text
    AIMS: Given that BMI correlates with risk of Type 2 diabetes mellitus (T2DM), and that hypertension is a common comorbid condition, we hypothesize that hypertension augments significantly the impact of obesity on T2DM onset. METHODS: We obtained data on T2DM in Kuwaiti natives from Kuwait Health Network Registry. We considered 1339 comorbid individuals with onset of hypertension preceding that of T2DM, and 3496 non-hypertensive individuals but with T2DM. Multiple linear regressions, ANOVA tests, and Cox proportional hazards models were used to quantify the impact of hypertension on correlation of BMI with age at onset and risk of T2DM. RESULTS: Impact of increasing levels of BMI on age at onset ot T2DM is seen augmented in patients diagnosed with hypertension. We find that the slope of the inverse linear relationship between BMI and onset age of T2DM is much steep in hypertensive patients (-0.69, males and -0.39, females) than in non-hypertensive patients (-0.36, males and -0.17, females). The decline in onset age for an unit increase of BMI is two-fold in males than in females. Upon considering BMI as a categorical variable, we find that while the mean onset age of T2DM in hypertensive patients decreases by as much as 5-12 years in every higher BMI categories, significant decrease in non-hypertensive patients exists only when severely obese. Hazard due to hypertension (against the baseline of non-hypertension and normal weight) increases at least two-fold in every obese category. While males have higher hazard due to hypertension in early adulthood, females have higher hazard in late adulthood. CONCLUSION: Pre-existing condition of hypertension augments the association of BMI with Type 2 diabetes onset in both males and females. The presented results provide health professionals directives on the extent of weight-loss required to delay onset of Type 2 diabetes in hypertensive versus non-hypertensive patients

    Summary of descriptive statistics of the data sets used in the study.

    No full text
    <p>@, The mean values presented in the previous two columns are compared using t-test.</p
    corecore